Early diagnosis of brain metastases using cerebrospinal fluid cell‐free DNA‐based breakpoint motif and mutational features in lung cancer

In this study

In this study, 76.6% of lung cancer patients (62/81) were diagnosed with parenchymal BM with or without other types of CNS diseases by enhanced brain MRI and/or computerized tomography (CT) scan (Table S1). CSF cytology was performed for 71 patients initially admitted to our hospital as a complementary approach for diagnosing leptomeningeal metastasis. All 81 patients underwent lumbar puncture to collect CSF for targeted next-generation sequencing (NGS), followed by extraction of BPM and mutational features for modelling (Supplementary Material).
According to the BM status and the relationship with follow-up time, the 81 patients were classified into three subgroups, including 62 POS patients (patients whose BM status was already positive at CSF sampling), 10 NEG patients (patients whose BM status was negative at CSF sampling and remained unchanged during the follow-up) and nine NTP patients (patients whose BM status turned from negative at CSF sampling to positive during the follow-up). As NTP patients were generally located between POS and NEG patients in the principal component analysis ( Figure S1), we, therefore, assigned 70 patients with definitive BM status at CSF sampling (62 POS and eight randomly selected NEG) to the training cohort to develop the BM detection model and 11 patients (nine NTP and two randomly selected NEG) to the testing cohort for independent evaluation of the model performance ( Figure 1A). Since the predictive model built solely on CSF ctDNA status showed a relatively high false-positive rate in detecting LCBM ( Figure S2), we wondered if incorporating the ctDNA status feature into the model based on BPM features of CSF ctDNA using elastic-net logistic regression, hereafter referred to as "integrated model", could help improve the model performance ( Figure 1B) F I G U R E 1 Schematic demonstration of the study design. (A) After excluding 15 patients with CSF samples that failed to pass the quality control (QC) examination and five patients lost to follow-up, sequencing data of 81 patients were analyzed to assess the potential of cerebrospinal fluid (CSF) circulating tumour DNA (ctDNA) in brain metastases (BM) prediction. Based on the BM status, these 81 patients were categorized into three subgroups: NEG, NTP and POS. The BM predictive model was developed based solely on the breakpoint motif (BPM) profile (BPM model) or combined CSF ctDNA status and BPM profile (integrated model). To train the machine learning model, 62 POS and eight randomly selected NEG patients were recruited into the training dataset. Leave-one-out cross-validation (LOOCV) was performed to evaluate the predictive performance of the BM predictive model, and the area under the curve (AUC) was calculated based on the holdout predictions. Independent testing of the model's performance was performed in the testing cohort comprising nine NTP and two randomly selected NEG patients. A higher predicted score indicates a higher probability of developing BM. Meanwhile, targeted next-generation sequencing (NGS) data of CSF ctDNA in 80 patients with known clinical outcome data were assessed to reveal genetic alterations related to BM aggressiveness. (B) Schematic demonstration of the integrated model construction. CSF samples were obtained from each participant through the lumbar puncture. A total of 474 cancer-related genes in CSF ctDNA were sequenced using high-throughput targeted NGS. The BPM profile was combined with each patient's CSF ctDNA status to generate the integrated machine-learning model using the elastic-net logistic regression algorithm. interval [CI]: 0.885-0.995), which was slightly better than the BPM model with an AUC of 0.929 (95% CI: 0.862-0.997, Figure 2A). Both models performed similarly in distinguishing lung cancer patients with different BM or ctDNA status ( Figure 2B,C and Figure S3A,B). Furthermore, both models were tested against different patient subgroups and persisted in high performance regardless of patients' clinical characteristics, such as age, ctDNA status, smoking and treatment history ( Figure 2D and Figure S3C). At 90% sensitivity, comparable high specificities were achieved by both models when tested in the matched cohorts ( Figure 2E and Figure S3D).
Next, we assessed our models' performance in the testing cohort comprising 9 NTP and 2 NEG patients. Inter-estingly, the integrated model achieved an AUC of 0.833 (95% CI: 0.4681-1), whereas the BPM model performed slightly better, with an AUC of 0.944 (95% CI: 0.7905-1, Figure 3A). The lower AUC of the integrated model might be explained by the inclusion of three BM-negative patients who tested positive for CSF ctDNA (two for training and one for testing). The positive CSF ctDNA result might be because CSF ctDNA results indicate BM status earlier than conventional neurological imaging, either because genomic changes have not yet caused organic pathologic changes or because the organic disease cannot be detected at an early stage.
While not statistically significant, higher risk scores were associated with shorter BM detection times in both models ( Figure 3B and Figure S4A). It is worth noting that the BPM model not only distinguished all seven high-risk patients from low-risk but also outperformed the integrated model for its lower false-negative rates in predicting BM of low-risk patients (BPM model: 50% versus integrated model: 66.7%; Figure 3C and Figure S4B). Additionally, the risk score computed by the BPM model better reflected BM-free survival (BMS) than the integrated model ( Figure 3D and Figure S4C). Overall, these findings suggested that the BPM model performs better in predicting LCBM. Incorporating the CSF ctDNA status feature into the BPM model could not further improve the model's performance.
To determine which motif contributed mostly to the model's predictive power, we performed a hierarchical clustering analysis in the training cohort using motifs with non-zero coefficients in the BPM model ( Figure 3E). The CGTTCG motif was found to have the most positive coefficient, which showed an upward trend in three patient subgroups categorized by BM status ( Figure 3F). In contrast, the GGAAAT motif, which had the greatest negative coefficient, presented an opposite trend in these patients ( Figure 3G). In the testing cohort, a similar distribution pattern of the CGTTCG motif was observed ( Figure S5). However, GGAAAT did not show the expected trend due to the limited sample size.
Lastly, we performed comprehensive genomic profiling using CSF ctDNA mutational features to identify BMassociated genetic alterations in 80 lung cancer patients with known clinical outcomes. The most frequent genomic alterations were in the EGFR, TP53, RB1, CDKN2A, and CDKN2B genes ( Figure 4A). Noteworthy, we emphasized alterations in the DNA-damage response (DDR)related pathways for their role in leptomeningeal metastasis development. 10 At univariate analysis, RB1 variants, EGFR amplification, and the Fanconi Anemia (FA) pathway alterations were individually associated with BMS ( Figure 4B-G and Table S2). RB1 variants and EGFR amplification in CSF ctDNA of lung cancer patients remained independently associated with an inferior prognosis in the multivariate model (P = 0.028 and 0.023, respectively; Figure 4H).
As a proof-of-concept pilot study exploring the clinical application of BPM profiling in the sensitive detection of BM with a machine-learning model in lung cancer patients, our study has a few limitations. The limited sample size may potentially compromise the credibility of our BM predictive model. Expanding the cohort size is warranted to improve the statistical power of a more accurate estimation of the risk score in lung cancer patients. In addition, although most patients in our study developed parenchymal BM during progression, the study cohort comprises various BM types due to sample availability. The cfDNA and BPM profiles may differ and need to be further investigated. Therefore, we plan to conduct a more extensive study and develop a BPM model capable of identifying patients with different BM types, which may add significant value to the current model for its clinical utility.
In summary, we established a robust BM predictive model using the BPM features of CSF ctDNA and profiled genomic alterations associated with BM in lung cancer patients. Our study provides insights into the potential use of CSF ctDNA sequencing for the early detection of LCBM and disease management.

A C K N O W L E D G E M E N T S
We thank the patients, their families, and the investigators and research staff involved. This work was supported by the Natural Science Foundation of China (NSFC81872475, NSFC82073345) and the Jinan Clinical Medicine Science and Technology Innovation Plan (202019060).

C O N F L I C T O F I N T E R E S T S TAT E M E N T
Song Wang, Xiaoying Wu, Jiaohui Pang, Xi Song, Xiaojun Fan, Qiuxiang Ou, Yang Xu, Hua Bao and Yang Shao are employees of Nanjing Geneseeq Technology Inc. The remaining authors declare no conflict of interest.